Simplifying Parallel List Traversal Simplifying Parallel List Traversal
نویسنده
چکیده
Computations described using Bird's constructive algebra of lists are nicely amenable to parallel implementation. Indeed , by using higher-order functions, ordered list traversals such as foldl and foldr can be expressed as unordered reductions. Based on this observation, a set of optimizations have been developed for list traversals in the parallel Haskell (pH) compilerr12]. These optimizations are inspired by, and partially subsume, earlier work on the optimization of sequential list traversal. 1 Introduction Lists are a basic computational \glue" in many functional languages. They are easily expressed and understood by programmers, and ooer a compact notation for representing collections of data. The functional language Haskell 7] is no exception to this rule, and in fact encourages or even requires the use of lists to express certain sorts of computation. Functional languages also encourage a compositional coding style. This causes lists to be used extensively as intermediate data structures. For example (all code is in Haskelll7]):
منابع مشابه
Understanding the SIMD Efficiency of Graph Traversal on GPU
Graph is a widely used data structure and graph algorithms, such as breadth-first search (BFS), are regarded as key components in a great number of applications. Recent studies have attempted to accelerate graph algorithms on highly parallel graphics processing unit (GPU). Although many graph algorithms based on large graphs exhibit abundant parallelism, their performance on GPU still faces for...
متن کاملUsing Graph Properties to Speed-up GPU-based Graph Traversal: A Model-driven Approach
While it is well-known and acknowledged that the performance of graph algorithms is heavily dependent on the input data, there has been surprisingly little research to quantify and predict the impact the graph structure has on performance. Parallel graph algorithms, running on many-core systems such as GPUs, are no exception: most research has focused on how to efficiently implement and tune di...
متن کاملLock-free deques and doubly linked lists
We present a practical lock-free shared data structure that efficiently implements the operations of a concurrent deque as well as a general doubly linked list. The implementation supports parallelism for disjoint accesses and uses atomic primitives which are available in modern computer systems. Previously known lock-free algorithms of doubly linked lists are either based on non-available atom...
متن کاملImproved Parallel Processing of Massive De Bruijn Graph for Genome Assembly
De Bruijn graph is a vastly used technique for developing genome assembly software nowadays. The scale of this kind of graph can reach billions of vertices and edges which poses great challenges to the genome assembly task. It is of great importance to study scalable genome assembly algorithms in order to cope with this situation. Despite some recent works which begin to address the scalability...
متن کاملAn FMM Based on Dual Tree Traversal for Many-core Architectures
The present work attempts to integrate the independent efforts in the fast N-body community to create the fastest N-body library for many-core and heterogenous architectures. Focus is placed on low accuracy optimizations, in response to the recent interest to use FMM as a preconditioner for sparse linear solvers. A direct comparison with other state-of-the-art fast N -body codes demonstrates th...
متن کامل